{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "SFA6R-4jL7SS"
   },
   "source": [
    "# Synthetic Data Generator Notebook\n",
    "## About\n",
    "This colab notebook demonstrates the use of Frontier and Open-source LLM models for generating synthetic dataset for a business scenario provided by the user. From a UI interface implemented in gradio, a user can define their business scenario in detail, select the number of records needed along with the its format and adjust the number of max output tokens to be generated by the chosen LLM.\n",
    "\n",
    "It does not stop here. Once the records have been produced in the LLM output, it can be extracted and stored in a file, format same as set by user before. The file is stored in colab notebook under the contents directory. All of this is extraction is done with the help of the 're' library. My first time using it and I totally enjoyed learning it.\n",
    "\n",
    "## Outlook\n",
    "Sometimes the response is loaded with the user prompt and a lot of tags when using an open-source models, such as Mixtral from Mistral. This is because of the prompt format being used. The 'assistant' 'role' format does not suit them. This is an optimization to look for and can be easily done by using custom prompt template for such models and these templates are hinted on their huggingface repo."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ip4I4Lff3B2M"
   },
   "source": [
    "## Install & Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "8zVlW-GMcBaU",
    "outputId": "0c473564-fb93-41a9-c819-e6aa2382d75a"
   },
   "outputs": [],
   "source": [
    "!pip install -q gradio anthropic requests torch bitsandbytes transformers accelerate openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "YKVNzE5sFH2l"
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "import re\n",
    "import os\n",
    "import sys\n",
    "import gc\n",
    "import io\n",
    "import json\n",
    "import anthropic\n",
    "import gradio as gr\n",
    "import requests\n",
    "import subprocess\n",
    "import google.generativeai as ggai\n",
    "import torch\n",
    "import tempfile\n",
    "import shutil\n",
    "from io import StringIO\n",
    "import pandas as pd\n",
    "from google.colab import userdata\n",
    "from huggingface_hub import login\n",
    "from openai import OpenAI\n",
    "from pathlib import Path\n",
    "from datetime import datetime\n",
    "from IPython.display import Markdown, display, update_display\n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LWpD6bZv3mAR"
   },
   "source": [
    "## HuggingFace Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "aeC2oWY2FTv7"
   },
   "outputs": [],
   "source": [
    "# Sign in to HuggingFace Hub\n",
    "\n",
    "hf_token = userdata.get('HF_TOKEN')\n",
    "login(hf_token, add_to_git_credential=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8Au2UPVy3vn5"
   },
   "source": [
    "## Frontier Models configuration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "geBBsd14X3UL"
   },
   "outputs": [],
   "source": [
    "openai_client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))\n",
    "anthropic_client = anthropic.Anthropic(api_key=userdata.get('ANTHROPIC_API_KEY'))\n",
    "ggai.configure(api_key=userdata.get('GOOGLE_API_KEY'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tCnDIOlKgjbO"
   },
   "source": [
    "## Defining Prompts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "gkwXZsxofAU1"
   },
   "outputs": [],
   "source": [
    "system_prompt = \"\"\"\n",
    "You are a synthetic dataset generator. Your role is to create synthetic dataset that infers structured data schemas from business scenarios given by the user.\n",
    "\n",
    "Your task is to:\n",
    "1. Understand the user's business problem(s) or use case(s).\n",
    "2. Identify the key fields needed to support that scenario.\n",
    "3. Define appropriate field names, data types, and formats.\n",
    "4. Generate synthetic records that match the inferred schema.\n",
    "\n",
    "Guidelines:\n",
    "- Use realistic field names and values. Do not invent unrelated fields or values.\n",
    "- Choose sensible data types: string, integer, float, date, boolean, enum, etc.\n",
    "- Respect logical constraints (e.g., age range, date ranges, email formats).\n",
    "- Output the dataset in the format the user requests (json, csv, txt, markdown table).\n",
    "- If the scenario is vague or broad, make reasonable assumptions and explain them briefly before generating the dataset.\n",
    "- Always generate a dataset that supports the business use case logically.\n",
    "\n",
    "Before generating the data, display the inferred schema in a readable format.\n",
    "\"\"\"\n",
    "\n",
    "# trial_user_prompt = \"I’m building a churn prediction model for a telecom company. Can you generate a synthetic dataset with 100 rows?\"\n",
    "def get_user_prompt(business_problem, no_of_samples, file_format):\n",
    "  return f\"\"\"\n",
    "  The business scenario for which I want you to generate a dataset is defined below:\n",
    "  {business_problem}\n",
    "\n",
    "  Generate a synthetic dataset of {no_of_samples} records in {file_format} format.\n",
    "  When generating the dataset, wrap it between the '<<<>>>' tag. Make sure the tag is there in the output.\n",
    "  Do not include any other special characters in between the tags, other than the ones required in producing the correct format of data.\n",
    "  For examples: When a 'csv' format is given, only the ',' character can be used in between the tags.\n",
    "  \"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yNpVf9-oQdoO"
   },
   "source": [
    "### Quanitzation Config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3ErZ315MQdU3"
   },
   "outputs": [],
   "source": [
    "# This allows us to load the model into memory and use less memory\n",
    "def get_quantization_config():\n",
    "  return BitsAndBytesConfig(\n",
    "      load_in_4bit=True,\n",
    "      bnb_4bit_use_double_quant=True,\n",
    "      bnb_4bit_compute_dtype=torch.bfloat16,\n",
    "      bnb_4bit_quant_type=\"nf4\"\n",
    "  )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "clGtRh0N4951"
   },
   "source": [
    "## HF Model inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MAhyn1ehb3Dh"
   },
   "outputs": [],
   "source": [
    "# All in one HuggingFace Model Response function\n",
    "def run_hfmodel_and_get_response(prompt, model_name, output_tokens):\n",
    "    tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
    "    tokenizer.pad_token = tokenizer.eos_token\n",
    "    inputs = tokenizer.apply_chat_template(prompt, return_tensors=\"pt\")\n",
    "    if torch.cuda.is_available():\n",
    "      inputs = inputs.to(\"cuda\")\n",
    "    streamer = TextStreamer(tokenizer)\n",
    "    if \"microsoft/bitnet-b1.58-2B-4T\" in model_name:\n",
    "      model = AutoModelForCausalLM.from_pretrained(model_name, device_map=\"auto\", trust_remote_code=True)\n",
    "    elif \"tiiuae/Falcon-E-3B-Instruct\" in model_name:\n",
    "      model = AutoModelForCausalLM.from_pretrained(model_name, device_map=\"auto\", torch_dtype=torch.float16 )\n",
    "    else:\n",
    "      model = AutoModelForCausalLM.from_pretrained(model_name, device_map=\"auto\", quantization_config=get_quantization_config())\n",
    "    outputs = model.generate(inputs, max_new_tokens=output_tokens, streamer=streamer)\n",
    "    response = tokenizer.decode(outputs[0])\n",
    "    del model, inputs, tokenizer, outputs\n",
    "    gc.collect()\n",
    "    torch.cuda.empty_cache()\n",
    "    return response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Gh_Ny1aM-L8z"
   },
   "source": [
    "## Frontier Models Inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "h11WlZNhfHCR"
   },
   "outputs": [],
   "source": [
    "# ChatGPT, Claude and Gemini response function\n",
    "def get_chatgpt_response(prompt, model_name, output_tokens):\n",
    "  response = openai_client.chat.completions.create(\n",
    "        model=model_name,\n",
    "        messages=prompt,\n",
    "        max_tokens=output_tokens,\n",
    "    )\n",
    "  return response.choices[0].message.content\n",
    "\n",
    "def get_claude_response(prompt, model_name, output_tokens):\n",
    "  response = anthropic_client.messages.create(\n",
    "        model=model_name,\n",
    "        max_tokens=output_tokens,\n",
    "        system=system_prompt,\n",
    "        messages=[\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": prompt,\n",
    "            }\n",
    "        ],\n",
    "    )\n",
    "  return response.content[0].text\n",
    "\n",
    "def get_gemini_response(prompt, model_name, output_tokens):\n",
    "    model = ggai.GenerativeModel(\n",
    "          model_name=model_name,\n",
    "          system_instruction=system_prompt,\n",
    "    )\n",
    "\n",
    "    response = model.generate_content(prompt, generation_config={\n",
    "        \"max_output_tokens\": output_tokens,\n",
    "        \"temperature\": 0.7,\n",
    "    })\n",
    "    return response.text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nzHbM_WQvRgT"
   },
   "source": [
    "## Gradio Implementation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uFWZqw1R-al_"
   },
   "source": [
    "### Dropdowns Selection Lists"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "rOzEb0o--aD7"
   },
   "outputs": [],
   "source": [
    "# Dropdown List Values for the user\n",
    "MODEL_TYPES=[\"GPT\", \"Claude\", \"Gemini\", \"HuggingFace\"]\n",
    "OPENAI_MODEL_NAMES=[\"gpt-4o-mini\", \"gpt-4o\", \"gpt-3.5-turbo\"]\n",
    "ANTHROPIC_MODELS=[\"claude-3-7-sonnet-latest\", \"claude-3-5-haiku-latest\", \"claude-3-opus-latest\"]\n",
    "GOOGLE_MODELS=[\"gemini-2.0-flash\", \"gemini-1.5-pro\"]\n",
    "HUGGINGFACE_MODELS=[\n",
    "    \"meta-llama/Llama-3.2-3B-Instruct\",\n",
    "    \"microsoft/bitnet-b1.58-2B-4T\",\n",
    "    \"ByteDance-Seed/Seed-Coder-8B-Instruct\",\n",
    "    \"tiiuae/Falcon-E-3B-Instruct\",\n",
    "    \"Qwen/Qwen2.5-7B-Instruct\"\n",
    "]\n",
    "MODEL_NAMES = {\n",
    "    \"GPT\": OPENAI_MODEL_NAMES,\n",
    "    \"Claude\": ANTHROPIC_MODELS,\n",
    "    \"Gemini\": GOOGLE_MODELS,\n",
    "    \"HuggingFace\": HUGGINGFACE_MODELS\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sbXGL8_4-oKc"
   },
   "source": [
    "### UI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_0NCY7FgCVHj"
   },
   "outputs": [],
   "source": [
    "with gr.Blocks() as generator_ui:\n",
    "    gr.Markdown(\"# 🧠 Business Scenario → Synthetic Dataset Generator\")\n",
    "\n",
    "    with gr.Row():\n",
    "      with gr.Column(scale=3):\n",
    "        with gr.Row():\n",
    "          dataset_size=gr.Number(value=10, label=\"Enter the number of data samples to generate.\", show_label=True)\n",
    "          format=gr.Dropdown([\"json\", \"csv\", \"txt\", \"markdown\"], label=\"Select the format for the dataset\", show_label=True)\n",
    "        with gr.Row():\n",
    "          scenario=gr.Textbox(label=\"Business Scenario\", lines=5, placeholder=\"Describe your business scenario here\")\n",
    "        with gr.Row():\n",
    "          error = gr.Markdown(visible=False)\n",
    "        with gr.Row():\n",
    "          clear = gr.Button(\"Clear Everything\")\n",
    "          submit = gr.Button(\"Generate Dataset\", variant=\"primary\")\n",
    "\n",
    "      with gr.Column(scale=1):\n",
    "          model_type = gr.Dropdown(MODEL_TYPES, label=\"Model Type\", show_label=True, info=\"Select the model type you want to use\")\n",
    "          model_name = gr.Dropdown(MODEL_NAMES[model_type.value], label=\"Model Name\", show_label=True, allow_custom_value=True, info=\"Select the model name or enter one manually\")\n",
    "          output_tokens= gr.Number(value=1000, label=\"Enter the max number of output tokens to generate.\", show_label=True, info=\"This will impact the length of the response containg the dataset\")\n",
    "\n",
    "    with gr.Row():\n",
    "      # Chatbot Interface\n",
    "        chatbot = gr.Chatbot(\n",
    "            type='messages',\n",
    "            label='Chatbot',\n",
    "            show_label=True,\n",
    "            height=300,\n",
    "            resizable=True,\n",
    "            elem_id=\"chatbot\",\n",
    "            avatar_images=(\"🧑\", \"🤖\",)\n",
    "        )\n",
    "    with gr.Row(variant=\"compact\"):\n",
    "      extract_btn = gr.Button(\"Extract and Save Dataset\", variant=\"huggingface\", visible=False)\n",
    "      file_name = gr.Textbox(label=\"Enter file name here (without file extension)\", placeholder=\"e.g. cancer_synthetic, warehouse_synthetic (no digits)\", visible=False)\n",
    "    with gr.Row():\n",
    "      markdown_preview = gr.Markdown(visible = False)\n",
    "      dataset_preview = gr.Textbox(label=\"Dataset Preview\",visible=False)\n",
    "    with gr.Row():\n",
    "      file_saved = gr.Textbox(visible=False)\n",
    "\n",
    "    def run_inference(scenario, model_type, model_name, output_tokens, dataset_size, format):\n",
    "      \"\"\"Run the model and get the response\"\"\"\n",
    "      model_type=model_type.lower()\n",
    "      print(f\"scenario: {scenario}\")\n",
    "      print(f\"model_type: {model_type}\")\n",
    "      print(f\"model_name: {model_name}\")\n",
    "      if not scenario.strip():\n",
    "        return gr.update(value=\"❌ **Error:** Please define a scenario first!\",visible=True), []\n",
    "\n",
    "      user_prompt = get_user_prompt(scenario, dataset_size, format)\n",
    "      prompt =  [\n",
    "          {\"role\": \"system\", \"content\": system_prompt},\n",
    "          {\"role\": \"user\", \"content\": user_prompt},\n",
    "      ]\n",
    "\n",
    "      if model_type == \"gpt\":\n",
    "        response = get_chatgpt_response(prompt=prompt, model_name=model_name, output_tokens=output_tokens)\n",
    "      elif model_type == \"claude\":\n",
    "        response = get_claude_response(prompt=user_prompt, model_name=model_name, output_tokens=output_tokens)\n",
    "      elif model_type == \"gemini\":\n",
    "        response = get_gemini_response(prompt=user_prompt, model_name=model_name, output_tokens=output_tokens)\n",
    "      else:\n",
    "        response = run_hfmodel_and_get_response(prompt=prompt, model_name=model_name, output_tokens=output_tokens)\n",
    "        torch.cuda.empty_cache()\n",
    "      history = [\n",
    "          {\"role\": \"user\", \"content\": scenario},\n",
    "          {\"role\": \"assistant\", \"content\": response}\n",
    "      ]\n",
    "      return gr.update(visible=False), history\n",
    "\n",
    "    def extract_dataset_string(response):\n",
    "      \"\"\"Extract dataset content between defined tags using regex.\"\"\"\n",
    "      # Remove known artificial tokens (common in HuggingFace or Claude)\n",
    "      response = re.sub(r\"<\\[.*?\\]>\", \"\", response)\n",
    "\n",
    "      # Remove system or prompt echo if repeated before dataset\n",
    "      response = re.sub(r\"(?is)^.*?<<<\", \"<<<\", response.strip(), count=1)\n",
    "\n",
    "      # 1. Match strict <<<>>>...<<<>>> tag blocks (use last match)\n",
    "      matches = re.findall(r\"<<<>>>[\\s\\r\\n]*(.*?)[\\s\\r\\n]*<<<>>>\", response, re.DOTALL)\n",
    "      if matches:\n",
    "          return matches[-1].strip()\n",
    "\n",
    "      # 2. Match loose <<< ... >>> format\n",
    "      matches = re.findall(r\"<<<[\\s\\r\\n]*(.*?)[\\s\\r\\n]*>>>\", response, re.DOTALL)\n",
    "      if matches:\n",
    "          return matches[-1].strip()\n",
    "\n",
    "      # 3. Match final fallback: take everything after last <<< as raw data\n",
    "      last_open = response.rfind(\"<<<\")\n",
    "      if last_open != -1:\n",
    "          raw = response[last_open + 3 :].strip()\n",
    "          # Optionally cut off noisy trailing notes, explanations, etc.\n",
    "          raw = re.split(r\"\\n\\s*\\n|Explanation:|Note:|---\", raw)[0]\n",
    "          return raw.strip()\n",
    "\n",
    "      return \"Could not extract dataset! Try again with a different model.\"\n",
    "\n",
    "    def extract_dataset_from_response(chatbot_history, file_name, file_type):\n",
    "      \"\"\"Extract dataset and update in gradio UI components\"\"\"\n",
    "      response = chatbot_history[-1][\"content\"]\n",
    "      if not response:\n",
    "        return gr.update(visible=True, value=\"Could not find LLM Response! Try again.\"), gr.update(visible=False)\n",
    "\n",
    "      # match = re.search(r'<<<\\s*(.*?)\\s*>>>', response, re.DOTALL)\n",
    "      # print(match)\n",
    "      # if match and match.group(1).strip() == \"\":\n",
    "      #   match = re.search(r'<<<>>>\\s*(.*?)\\s*<<<>>>', response, re.DOTALL)\n",
    "      #   print(match)\n",
    "      # if match is None:\n",
    "      #   return gr.update(visible=True, value=\"Could not extract dataset! Try again with a different model.\"), gr.update(visible=False)\n",
    "      # dataset = match.group(1).strip()\n",
    "      dataset = extract_dataset_string(response)\n",
    "      if dataset == \"Could not extract dataset! Try again with a different model.\":\n",
    "        return gr.update(visible=True, value=dataset), gr.update(visible=False)\n",
    "      text = save_dataset(dataset, file_type, file_name)\n",
    "      return gr.update(visible=True, value=text), gr.update(visible=True, value=dataset)\n",
    "\n",
    "    def save_dataset(dataset, file_format, file_name):\n",
    "      \"\"\"Save dataset to a file based on the selected format.\"\"\"\n",
    "      file_name=file_name+\".\"+file_format\n",
    "      print(dataset)\n",
    "      print(file_name)\n",
    "      if file_format == \"json\":\n",
    "        try:\n",
    "          data = json.loads(dataset)\n",
    "          with open(file_name, \"w\", encoding=\"utf-8\") as f:\n",
    "            json.dump(data, f, indent=4)\n",
    "          return \"Dataset saved successfully!\"\n",
    "        except:\n",
    "          return \"Could not save dataset! Try again in another format.\"\n",
    "      elif file_format == \"csv\":\n",
    "        try:\n",
    "          df = pd.read_csv(StringIO(dataset))\n",
    "          df.to_csv(file_name, index=False)\n",
    "          return \"Dataset saved successfully!\"\n",
    "        except:\n",
    "          return \"Could not save dataset! Try again in another format.\"\n",
    "      elif file_format == \"txt\":\n",
    "        try:\n",
    "          with open(file_name, \"w\", encoding=\"utf-8\") as f:\n",
    "            f.write(dataset)\n",
    "          return \"Dataset saved successfully!\"\n",
    "        except:\n",
    "          return \"Could not save dataset! Try again in another format.\"\n",
    "\n",
    "    def clear_chat():\n",
    "      \"\"\"Clear the chat history.\"\"\"\n",
    "      return \"\", [], gr.update(visible=False), gr.update(visible=False)\n",
    "\n",
    "    def show_extract_btn(chatbot_history, format):\n",
    "      \"\"\"Show the extract button if the response has been displayed in the chatbot and format is not set to markdown\"\"\"\n",
    "      if chatbot_history == []:\n",
    "        return gr.update(visible=False), gr.update(visible=False), gr.update(visible=False)\n",
    "      if format == \"markdown\":\n",
    "        return gr.update(visible=True, value=chatbot_history[1][\"content\"]), gr.update(visible=False), gr.update(visible=False)\n",
    "      return gr.update(visible=False), gr.update(visible=True), gr.update(visible=True)\n",
    "\n",
    "    extract_btn.click(\n",
    "        fn=extract_dataset_from_response,\n",
    "        inputs=[chatbot, file_name, format],\n",
    "        outputs=[file_saved, dataset_preview]\n",
    "    )\n",
    "\n",
    "    chatbot.change(\n",
    "        fn=show_extract_btn,\n",
    "        inputs=[chatbot, format],\n",
    "        outputs=[markdown_preview, extract_btn, file_name]\n",
    "    )\n",
    "\n",
    "    model_type.change(\n",
    "        fn=lambda x: gr.update(choices=MODEL_NAMES[x], value=MODEL_NAMES[x][0]),\n",
    "        inputs=[model_type],\n",
    "        outputs=[model_name]\n",
    "    )\n",
    "\n",
    "    submit.click(\n",
    "        fn=run_inference,\n",
    "        inputs=[scenario, model_type, model_name, output_tokens, dataset_size, format],\n",
    "        outputs=[error, chatbot],\n",
    "        show_progress=True\n",
    "    )\n",
    "\n",
    "    clear.click(\n",
    "        clear_chat,\n",
    "        outputs=[scenario, chatbot, dataset_preview, file_saved]\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "id": "kzDUJahK8uRN",
    "outputId": "c5674be2-b262-4439-ae91-4f3e1f49e041"
   },
   "outputs": [],
   "source": [
    "# Example Scenarios\n",
    "\n",
    "# Generate a dataset for predicting customer churn in a subscription-based telecom company. Include features like monthly charges, contract type, tenure (in months), number of support calls, internet usage (in GB), payment method, and whether the customer has churned.\n",
    "# Generate a dataset for training a model to approve/reject loan applications. Include features like loan amount, applicant income, co-applicant income, employment type, credit history (binary), loan term, number of dependents, education level, and loan approval status.\n",
    "# Create a dataset of credit card transactions for detecting fraud. Include transaction ID, amount, timestamp, merchant category, customer location, card presence (yes/no), transaction device type, and fraud label (yes/no).\n",
    "# Generate a dataset of investment customers with fields like portfolio value, age, income bracket, risk appetite (low/medium/high), number of transactions per month, preferred investment types, and risk score.\n",
    "# Create a dataset of hospitalized patients to predict readmission within 30 days. Include patient ID, age, gender, number of prior admissions, diagnosis codes, length of stay, discharge type, medications prescribed, and readmission label.\n",
    "# Generate a dataset for predicting medical appointment no-shows. Include appointment ID, scheduled date, appointment date, lead time (days between scheduling and appointment), SMS reminders sent, patient age, gender, health condition severity, and no-show status.\n",
    "\n",
    "generator_ui.launch(share=True, debug=True, inbrowser=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_9HIC_AzfZBZ"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
